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Su/X  and  SU/P  are  knowledge-based  programs  which 
employ  pattern-invoked  inference  methods.  Both 
tasks  are  concerned  with  the  i n t er pret a t i on  of 
large  quantities  of  digitized  signal  data.  The 
task  of  SU/X  is  to  understand  "continuous  signals", 
that  is,  signals  which  persist  over  time.  The  task 
of  SU/P  is  to  interpret  protein  x-ray 
crystallographic  data.  Some  features  of  the  design 
are:  (1)  incremental  interpretation  of  data 
employing  many  different  pattern-invoked  sources  of 
knowledge,  (2)  production  rule  representation  of 
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1 INTRODUCTION  AND  SUMMARY 


This  paper  deacribes  a design  of  knowiedge-based 
programs  which  employ  pattern-invoked  inference  methods. 
Domain  and  strategy  knowledge  are  represented  as  production 
rules  to  be  invoked  when  appropriate  situations  arise  in 
the  probiem-soi ving  process.  The  same  basic  design 
philosophy  is  utiiized  in  two  task  domains,  both  of  which 
are  concerned  with  the  interpretation  of  large  voiumes  of 
digitized  physical  signals.  The  tasks  are  (1)  the 
understanding  of  continuous  signais  produced  by  objects  and 
(2)  the  interpretation  of  protein  x-ray  crystallographic 
data  in  terms  of  a three-dimensional  modei  of  the  moiecule. 
The  programs  associated  with  these  tasks  are  called  SU/X 
and  SU/P,  respectively. 

Some  of  the  design  concepts  in  SU/X  and  SU/P  are 
rooted  in  the  HEARSAY-II  program  [4,  6-7].  Concepts  which 
have  been  borrowed  are:  (a)  a globai  data  base,  caiied  the 
blackboard,  for  the  integration  of  knowledge  sources  and 
(b)  a multilevel  representation  of  the  solution  hypotheses. 
These  basic  concepts  are  integrated  into  a system  design 
that  emphasizes:  (a)  the  representation  of  knowledge  in 
production  rules,  (b)  the  representation  of  the  control 
structure  as  sources  of  knowledge  related  to 
probiem-soiving  methods  and  strategies,  (c)  the  capability 
of  the  program  to  explain  its  reasoning  steps,  and  (d)  a 
level  of  generality  of  the  basic  design  concepts  leading  to 
application  in  different  tasks  or  domains. 

1.1  Major  Themes 

The  "understanding"  of  physical  signals  often  requires 
using  information  not  present  in  the  signal  data 
themselves.  Examples  of  such  information  are:  (a)  in  the 
continuous-signal  problem,  the  characteristics  of  the 
3 i gna i -pr oduc ing  objects,  (b)  in  the  pr ot e i n-mode i i ng 
problem,  the  amino  acid  sequence  and  the  stereochemical  and 
protein  chemistry  constraints.  Each  such  source  of 
knowiedge  may  at  any  time  provide  an  inference  which  serves 
as  a basis  for  another  knowledge  source  to  make  yet  another 
inference,  and  so  on,  until  ail  relevant  information  has 
been  used  and  appropriate  inferences  have  been  drawn. 

Essential  to  the  operation  of  the  program  is  its  model 
of  the  developing  hypothesis.  The  modei  is  a 
symbo 1 -s t r uc t ur e that  is  built  and  maintained  by  the 
program,  contains  what  is  known  about  the  unfoiding 
situation,  and  thus  provides  a context  for  the  ongoing 
analysis.  The  model  is  used  as  a reference  for  the 
interpretation  of  new  information,  assimilation  of  new 
events,  and  generation  of  expectations  concerning  future 
events.  It  Is  the  program 's  "cognitive  fiywheei". 
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SLI/X  and  SU/P  are  "knowiedge-baaed"  programa  (footnof'e 
1).  Their  powers  are  largely  derived  from  the  knowledge 
given  to  them  by  "expert"  human  analysts  and/or  "expert" 
algorithms.  Major  problems  in  the  design  of  such  systems 
show  up  vividly  in  these  two  programs: 

a.  Knowledge  acquisition.  This  is  a task  of 
systematically  ferreting  out  the  informal  and 
semiformal  knowledge  held  by  the  expert.  The 
breadth  and  sheer  volume  of  an  expert's  knowledge 
is  what  makes  his  analysis  general  and  powerful; 
yet,  obtaining  that  knowledge,  which  he  often  does 
not  realize  he  is  using,  is  a painstaking  and 
inexact  process  . 

b.  Knowledge  representation.  Having  acquired  the 
knowledge  in  its  "human"  form,  we  must  represent 
it  in  a form  that  is  convenient  and  efficient  for 
machine  processing  and  at  the  same  time  reasonably 
"natural"  (bear  in  mind  that  the  knowledge  rarely 
boils  down  merely  to  a set  of  numbers)  --  a 
difficult  and  time-consuming  task. 

c . I’^tegration  of  multiple,  diverse  sources  of 
knowl edge . Program  and  information  structures 
must  be  created  by  which  the  various  kinds  of 
knowledge  can  "work  together"  to  form  a coherent 
and  accurate  hypothesis.  When  the  knowledge 
exists  at  many  different  levels  of  abstraction  and 
aggregation  (say,  from  alpha-helix  substructure 
ail  the  way  down  to  electron  density  values  in  an 
electron  density  map),  one  has  a major  design 
problem. 

1.2  Major  Terms  and  Concepts 

The  task  of  "understanding"  the  data  is  accomplished 
at  various  levels  of  analysis.  These  levels  are  exhibited 
in  Figure  1.1  for  the  continuous-signal  interpretation 
problem  and  in  Figure  1.2  for  the  pr o t e i n -mode  1 i ng  problem. 
The  most  integrated  --  the  highest  --  levels  for  the  two 
problems  involve  the  description  of  the  si gna 1 -p r od uc ing 
objects,  and  the  three-dimensional  model  of  the  protein. 
The  lowest  levels,  that  is,  the  levels  closest  to  the  data, 
consist  of  the  line  features  derived  from  the  signal  data, 
and  the  atoms  and  their  coordinates  in  three  space. 

At  each  levei,  the  units  of  analysis  are  the 
hypothesis  elements.  These  are  sy ra bo  1 -s t r uc t ur e s that 
summarize  what  the  available  evidence  indicates  in  terms 
that  are  meaningful  at  that  particular  level. 
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Bridging  between  the  ieveis  of  anaiysis  are  sources  of 
knowi edge  [4,7].  A knowiedge  source  (KS)  is  capable  of 
putting  forth  the  inference  that  some  hypothesis  elements 
present  at  its  "input"  level  imply  some  particular 
hypothesis  elements(s)  at  its  "output"  level.  A source  of 
knowiedge  contains  not  only  the  knowledge  necessary  for 
making  its  own  specialized  Inferences,  but  also  the 
knowiedge  necessary  for  checking  the  inferences  made  by 
other  sources  of  knowiedge.  The  Inferences  which  draw 
together  hypothesis  elements  at  one  level  into  a hypothesis 
element  at  a higher  level  (or  which  operate  in  the  other 
direction)  are  represented  symbolically  as  links  between 
levels  (See  figures  1.1  and  1.2).  The  resulting  network, 
rooted  in  the  input  data  and  integrated  at  the  highest 
level  into  a description  of  the  hypothesized  problem 
solution,  is  called  the  current  best  hypothesis,  or  the 
hypothesis  for  short.  Each  source  of  knowiedge  holds  a 
considerable  body  of  specialized  information  that  a human 
expert  would  generally  consider  "ordinary".  Sometimes  i"his 
is  relatively  "hard"  knowledge  or  "textbook"  knowiedge. 
Also  represented  are  the  heuristics , that  is,  "rules  of 
good  guessing"  a human  expert  develops  in  his  area  of 
expertise.  These  "judgmental"  rules  are  generally 
accompanied  by  estimates  from  human  experts  concerning  the 
weight  that  each  rule  should  carry  in  the  analysis. 

Each  KS  is  composed  of  "pieces"  of  knowiedge.  By  a 
piece  of  knowiedge  we  mean  a production  rule,  that  is,  an 
IF-THEN  type  of  implication  formula.  The  "IF"  side,  also 
called  the  situation  side,  specifies  a set  of  conditions  or 
patterns  for  the  applicability  of  the  particular  rule.  The 
"THEN"  side,  also  called  the  action  side,  symbolizes  the 
implications  to  be  drawn  (more  precisely,  various 
processing  events  to  be  caused)  if  the  "IF"  conditions  are 
met.  (Refer  to  [2]  for  an  excellent  overview  of  production 
rules . ) 

The  knowiedge  of  how  to  perform,  that  is,  how  to  use 
the  available  knowiedge  sources,  is  another  kind  cf 
knowiedge  that  experts  possess.  This  type  of  knowledge  is 
also  represented  in  the  system  in  the  form  of 
control /strategy  production  rules,  which  promote 
flexibility  in  specifying  and  modifying  strategies  of 
anaiysis  . 

Hypothesis  formation  is  an  "opportunistic"  process. 
Both  dat  a-driven  and  model -driven  hypothesis  formation 
techniques  are  used  within  the  general  hypothesize-and-test 
paradigm.  One  of  the  tasks  of  the  con t r o 1 / s t r a t egy 
knowledge  source  is  to  determine  the  applicability  of  these 
methods  to  different  situations.  The  unit  of  processing 
activity  is  the  event . Events  symbolize  such  things  as 
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"what  inferences  to  make",  "what  symbol-structures  to 
modify",  "what  to  look  for  in  the  data",  and  so  on.  The 
basic  control  loop  for  these  event-driven  programs,  is  one 
in  which  iists  of  events  (events  sometimes  inciude  new 
data)  and  the  set  of  controi/strategy  rules  are 
periodically  scanned  to  determine  the  "next  thing  to  do" 
( footnot  e 2 ) . 

In  the  following  sections  we  discuss  issues  related  to 
the  representation  of  the  hypothesis,  the  knowledge 
sources,  and  the  control  structure.  Before  continuing, 
however,  we  will  briefly  describe  the  two  tasks  that  have 
been  implemented  and  list  some  guidelines  for  choosing 
applications  in  which  this  type  of  system  organization  may 
be  usef ui . 

2 THE  TASKS 

2.1  Interpretation  of  Continuous-Signals  (SU/X) 

The  s i gna i -und er 3 t a nd ing  program  performs  analysis  of 
data  derived  from  a digitized  plot  of  continuous  signals, 
the  interpretation  of  which  is  to  a considerable  degree  a 
function  of  time.  Examples  of  data  having  this 
characteristic  are  electromagnetic  and  acoustic  signals, 
and  signals  from  hospital  patients  monitored  in  an 
Intensive  care  unit.  The  "front-end"  s Igna i -pr oce ss i ng 
hardware  and  software  detect  energy  "packets"  appearing  at 
various  spectral  frequencies,  and  follow  these  packets  in 
time.  The  current  system  is  designed  to  analyze  a digitized 
description  of  these  data.  At  the  end  of  each  time  period, 
say,  a few  minutes,  the  user  is  given  an  Integrated 
analysis  of  the  interpreted  objects  within  its  data 
purview.  [5l 

2.2  Interpretation  of  Three-Dimensional  Signal  Data: 
Protein  Crystallography  (SU/P) 

The  task  of  this  program  is  to  infer  three-dimensional 
models  of  protein  molecules.  The  model  is  derived  from  an 
Interpretation  of  the  electron  density  map  of  the 
crystallized  protein.  The  density  map  is,  in  turn,  derived 
from  x-ray  diffraction  data.  These  data  typically  yield  a 
poorly  resolved  distribution  of  the  electron  density  within 
the  protein  molecule,  and  the  location  of  individual  atoms 
are  generally  not  identifiable.  Traditionally,  the  protein 
c ry st a 1 iogr aphe r embodies  his  interpretation  of  the 
electron  density  map  in  a "bail  and  stick"  molecular  modei 
fashioned  from  metai  parts.  These  parts  are  strung  together 
to  build  a modei  which  conforms  to  the  electron  density  map 
and  is  also  consistent  with  protein  chemistry  and 
stereochemical  constraints.  The  current  system  tries  to 
simulate  humans  who  build  models  incrementally  from  the 
most  "obvious"  regions  of  the  electron  density  map.  The 
incremental,  opportunistic  strategies  used  by  our  program 
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to  form  hypotheses  closely  resemble  the  problem-solving 
methods  used  by  human  model  builders.  Refer  to  [3]  for 
more  complete  description  of  the  problem. 

3 SUITABLE  APPLICATION  AREAS 

Building  a signal  interpretation  system  within  the 
program  organization  summarized  above  can  best  be  described 
as  "opportunistic"  analysis.  Bits  and  pieces  of 
information  must  be  used  as  opportunity  arises  to  build 
slowly  a coherent  picture  of  the  world  --  much  like  putting 
a jigsaw  puzzle  together.  Some  thoughts  on  the 
characteristics  of  problems  suited  to  this  approach  are 
listed  below : 

1 . Large  amounts  of  signal  data  need  to  be  analyzed. 
Examples  include  the  interpretation  of  speech  and 
other  acoustic  signals,  X-ray  and  other  spectral 
data,  radar  signals,  photographic  data,  etc.  (A 
variation  involves  understanding  a large  volume  of 
symbolic  data;  for  example,  the  maintenance  of  a 
global  plotboard  of  air  traffic  based  on  messages 
from  various  air  traffic  control  centers.) 

2 . Formal  or  Informal  interpretive  theories  exist. 
By  informal  interpretive  theory  we  mean  lore  or 
heuristics  which  human  experts  bring  to  bear  in 
order  to  "understand"  the  data.  These  inexact  and 
informal  rules  are  incorporated  as  KSs  in 
conjunction  with  more  formal  knowledge  about  the 
domain. 

3 . Task  domain  can  be  decomposed  hierarchically  in  a 
"natural  way"  [ H 1~7  In  many  cases  the  domain  can 
be  decomposed  into  a series  of  data  reduction 
levels,  where  various  interpretive  theories  (in 
the  sense  described  above)  exist  for  transforming 
data  from  one  level  to  another. 

*4.  "Opportunistic"  strategies  must  be  used.  That  is, 
there  is  no  computationally  feasible  "legal  move 
generator"  that  defines  the  space  of  solutions  in 
which  pruning  and  steering  take  place.  Rather,  by 
reasoning  about  bits  and  pieces  of  available 
evidence,  one  can  incrementally  generate  partial 
hypotheses  that  will  eventualy  lead  to  a more 
global  solution  hypothesis. 

3.1  Data-Driven  vs  Model-Driven  Hypothesis  Formation 
Methods 

We  have  combined  data-  and  model-driven  methods  of 
hypothesis  formation  in  the  design  of  SU/X  and  SU/P.  By 
"data-driven"  we  mean  "Inferred  from  the  input  data".  By 
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"modei-driven"  we  mean  "baaed  on  expectation"  where  the 
expectation  is  inferred  from  knowledge  about  the  domain. 
For  example,  a hypothesis  generated  by  a KS  which  infers  an 
amino  acid  sidechaln  from  the  electron  density  data  is  a 
data-driven  hypothesis.  On  the  other  hand,  a hypothesis 
about  the  existence  of  an  amino-acid  sidechaln  that  is 
deduced  from  topological  knowledge  of  the  protein  is  a 
model-based  hypothesis.  In  the  former  case,  the  data  is 
used  as  the  basis  for  signal  analysis;  in  the  latter  case, 
the  primary  data  is  used  soleiy  to  verify  the  expectation. 

There  are  no  hard-and-fast  criteria  for  determining 
which  of  the  two  hypothesis  formation  methods  is  more 
appropriate  for  a particular  signal-processing  task.  The 
choice  depends,  to  a large  extent,  on  the  nature  of  the  KSs 
which  are  available  and  on  the  power  of  the  analysis  model 
available.  Our  experience  points  strongly  toward  the  use 
of  a combination  of  these  techniques;  some  KS's  are 
strongly  data  dependent  while  others  are  strongiy  model 
dependent.  In  the  continuous-signal  interpretation 

program,  for  example,  the  majority  of  the  inferences  are 
data-driven,  with  occasional  model-driven  inferences.  The 
converse  is  true  in  the  protein  model -building  wich  peaces 
more  emphasis  on  modei-driven  hypothesis  generation.  The 
following  are  guidelines  we  have  used  in  determining  which 
of  the  two  methods  is  more  appropriate: 

1.  Signal  to  Noise  Ratio.  Problems  which  have 
inherently  low  S/N  ratios  are  better  suited  to 
solutions  by  model-driven  programs;  the  converse 
is  true  for  problems  with  high  S/N  ratios. 


Availability  of  a model.  A model,  sometimes 
referred  to  as  "the  semantics  of  the  task  domain", 
can  be  available  in  various  forms:  (1)  input  to 
an  abstract  level  of  the  hypothesis  structure,  (2) 
general  knowledge  about  the  task  domain,  or  (3) 
specific  knowledge  about  the  particular  task.  In 
the  protein  crystallography  problem,  for  instance, 
the  amino  acid  sequence  (the  topology  of  the 
protein)  serves  as  a model  for  guiding  the 
interpretation  of  the  primary  data.  However,  in 
the  continuous-signal  interpretation  problem,  the 
model  Is  drawn  from  general  knowledge  about  the 
signal  sources  and  from  other  relevant  external 
sources  of  information  that  serve  to  define  the 
context.  If  a reliable  model  is  available,  the 
dat a-int erpret at  ion  KSs  can  be  used  as  verifiers 
rather  than  generators  of  Inferences;  this  reduces 
the  computational  burden  on  the  slgna 1 -processing 
programs  at  the  "front  end". 
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H THE  NATURE  OF  THE  HYPOTHESIS 


In  order  to  integrate  a diversity  of  knowledge  about 
the  task  domain,  the  domain  is  decomposed  hierarchically 
into  levels  of  analysis.  We  will  describe  briefly  some  of 
the  basic  ideas  on  the  nature  of  the  hypothesis  (footnote 
3K 


A signal  interpretation  problem  can  be  viewed  as  a 
problem  of  "transforming"  signals  representing  an  object 
into  a symbolic  description  of  the  ojbect  on  a more 
abstract  level.  We  use  the  word  "transformation"  to  mean  a 
shift  from  one  representation  of  an  object  (digitized 
signals)  to  another  (symbolic  description)  using  any  formal 
or  informal  rules. 

The  data  structure  hierarchy  reflects  a plan  for  the 
utilization  of  the  various  data  transformation  KSs  which 
contribute  to  the  total  data  interpretation  process. 
Generally  these  transformational  steps  invoive  data 
reductions  of  the  primary  data  in  a stepwise  fashion  from 
the  detailed  to  the  more  abstract  description  of  the 
object.  However,  we  have  found  that  some  of  the  most  useful 
KSs  generate  inferences  spanning  several  levels.  For 
example,  in  the  pr ot e i n -mode  1 ing  problem,  a human  can  "see" 
in  the  electron  density  data,  helical  substructures  without 
knowing  or  observing  the  details  of  each  atom  placement. 
This  kind  of  knowledge  is  usually  very  specific  to 
situations;  human  experts  know,  and  use,  many  of  these 
specialized,  informal  bodies  of  knowledge. 

The  data  structure  of  the  solution  hypothesis  is  a 
linked  network  of  nodes,  where  each  node  (hypothesis 
element)  represents  a meaningful  aggregation  of  lower  level 
hypothesis  elements.  A link  between  any  two  hypothesis 
elements  represents  a result  of  some  action  by  a KS  and 
indirectly  points  to  the  KS  itself.  A link  has  associated 
with  it  directional  properties.  In  general,  the  direction 
indicates  one  of  the  the  following:  (1)  A link  which  goes 
from  a more  abstract  to  a less  abstract  level  of  the 
hypothesis  is  referred  to  as  an  " ex  pec t a 1 1 on - 1 i nk " . The 
node  at  the  end  of  an  expec t a t i on- 1 ink  is  a model-based 
hypothesis  element,  and  the  link  represents  "support  from 
above"  (i.e.  the  reason  for  proposing  the  hypothesis 
element  is  to  be  found  at  the  higher  level).  (2)  A link 
which  goes  in  the  opposite  direction,  from  lower  levels  of 
abstraction  to  higher,  is  referred  to  as  a 
"reduction-link".  The  node  at  the  end  of  a reduction-link 
is  a data-based  hypothesis  element,  and  the  link  represent 


"support  from  beiow"  (i.e.  the  reason  for  proposing  the 
hypothesis  element  is  to  be  found  at  a lower  ievei). 
(These  directions  correspond  loosely  to  "top-down"  and 
"bottom-up"  path  generation.)  Examples  of  KSs  and 
hypothesis  elements  generated  by  the  KSs  are  shown  in 
Figure  2 . 

The  p r 0 t e i n -mod e i i ng  problem  posed  some  difficulties 
in  the  design  of  its  hypothesis  structure.  These  can  be 
attributed  to  several  factors.  First,  the  decomposition  of 
the  solution  space  (the  three-dimensional  model)  and  the 
abstractions  of  the  primary  data  (electron  density)  do  not 
result  in  one  consistent  data  hierarchy  but  result  in  two 
hierarchies.  Second,  the  two  hierarchies  overlap 
semantically  at  some  levels  but  are  not  representationaily 
compatible.  Third,  very  littie  is  known  about  mapping  the 
object  between  the  two  spaces.  As  indicated  in  Figure  3, 
however,  the  two  hierarchies,  with  a network  of  links,  can 
be  merged  into  a single  representation  of  the  problem 
space.  This  representation  indicate. s that  hypothesis  need 
not  be  represented  as  a strict  hierarchy;  it  can  be 
represented  as  a more  general  network  of  related  elements. 
(Refer  to  [3]  for  more  detailed  description.) 

5 THE  NATURE  OF  THE  "CONTROL" 

A system's  performance  depends  both  on  the  competence 
of  each  KS  and  on  the  utilization  of  these  KSs  within  the 
context  of  the  goals  of  the  task  domain. 

There  are  two  seperate  but  equally  important  issues 
involved  in  a design  of  a know! edge -ba sed  performance 
program:  (1)  the  availability  and  the  quality  of  the 
specialist  KSs  that  cooperate  in  the  building  of  a 
hypothesis.  (These  KSs  define  the  hierarchy  of 
abstractions  of  the  hypothesis.)  (2)  the  optimal 
utilization  of  these  KSs.  If  we  view  the  KSs  as  resources 
that  are  available  for  solving  a problem,  then  the  optimal 
resource  allocation  strategy  is  determined  by  the  quality, 
the  size,  and  the  cost  of  the  KSs,  and  the  state  of  the 
current  hypothesized  solution.  The  control  structure  must 
be  sensitive  to,  and  be  able  to  adjust  to,  the  numerous 
possible  solution  states  which  arise  in  the  course  of 
solving  a problem.  Within  this  viewpoint,  then,  what  is 
commonly  called  the  "control  structure"  becomes  another 
totally  domain-dependent  knowledge  source.  The  notion  of  a 
"hierarchic  control"  is  an  attempt  to  come  to  grips  with 
the  issues  of  resource  allocation  and  "control"  strategies. 
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5.1  Hierarchically  Organized  Coniroi  Structures 

In  a "hierarchically  organized  control  structure," 
probiem-soiving  activities  themselves  form  a hierarchy  of 
knowledge  necessary  for  solving  the  problem.  On  the  lowest 
level  is  a set  of  knowledge  sources  the  tasks  of  which  are 
to  make  the  primary  inferences  in  the  hypothesis  network 
previously  described.  We  refer  to  this  level  of  knowledge 
as  the  " hy pot hesi s - f orma t i on " level.  At  the  next  level  are 
"meta"  KSs  that  have  knowledge  about  the  capabilities  of 
the  KSs  in  the  hypothesis-formation  level.  We  refer  to 
this  level  as  the  "KS-activation"  level;  a KS  on  this  ievei 
represents  a policy  on  knowledge  utilization.  At  the 
highest  level  is  the  Strategy-KS  which  analyzes  the  quality 
of  the  current  solution  to  determine  what  region  of  the 
data  to  anlyze  next;  it  also  determines  what  kind  of 
strategy  to  use . 

Another  way  to  describe  this  organization  is  as 
follows:  The  KSs  are  organized  hierarchically  --  much  like 
r he  management  structure  in  a corporate  environment  --  in 
terms  of  the  scope  of  their  knowledge  and  the  specificity 
of  their  functions. 

Example:  A KS  capable  of  deciding  whether  to  look  for 
helices  or  to  continue  looking  for  a large  amino  acid 
sidechain  would  possess  a higher  level  of  knowledge 
than  a KS  whose  function  is  to  infer  the  placement  of 
atoms  of  some  amino  acid  sidechain.  It  is  a higher 
level  because  its  area  of  expertise  (choosing  the  best 
problem  solving  strategy  for  a given  situation),  is 
broader  in  scope  and  narrower  in  the  knowledge  of  the 
processing  specifics.  It  does  not  have,  and  it  need 
not  have,  any  knowledge  of  the  details  of  the 
execution  of  the  probiem-soiving  strategy  it  chooses. 

This  control  hierarchy  should  be  clearly  distinguished 
from  the  hierarchy  of  hypothesis  levels.  The  hypothesis 
hierarchy  represents  an  a priori  plan  for  the  solution 
presented  by  a "natural"  decomposition  of  the  analysis 
problem.  The  control  hierarchy,  on  the  other  hand, 
represents  the  organization  of  the  probiem-soiving 
activities  necessary  for  the  formation  of  the  hypothesis. 
Figure  U shows  a general  relationship  between  the 
organization  of  the  hypothesis  structure  and  the 
organization  of  the  control  structure.  Table  1 summarizes 
the  scope  of  KSs  on  each  level  of  control  hierarchy. 
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5.2  Control  Structure  Implementation 

All  information  needed  by  the  different  KSs  is 

contained  in  a global  data  structure  called  the 

"blackboard".  The  "blackboard"  concept  has  its  origin  in 

HEARSAY  [4]  and  is  extended  in  SU/X  and  SU/P.  The  contents 
of  the  blackboard  In  SU/X  and  SU/P  const ist  of: 

1.  The  current  best  hypothesis  (CBH) 

2.  The  Event-iist:  A list  of  changes  in  the 

hypothesis  which  have  not  yet  been  processed  by 
any  KS.  An  event  also  contains  the  name  of  the  KS 
and  the  identifier  of  the  rule  which  caused  the 
change . 

3.  The  Event;  A global  variable  containing  the 

currently  "active  event",  that  is,  an  event  which 
is  currently  being  processed  by  some  KS.  The 

Event  aiso  represents  the  current  focus  of 

attention. 

4.  The  Probiems-list:  A list  of  unresolved  problems 

encountered  by  various  KSs.  Such  problems  range 
from  expected  data  not  yet  available,  to 
detectable  "errors"  in  the  program  (e.g. 
insufficient  knowledge). 

5.  The  Event  history  list:  The  Event,  together  with 

its  Predecessor  and  Successor  events  form  a causal 
chain  of  reasoning.  In  the  continuos-signai 
understanding  problem,  the  Event  history  list  is 
sometimes  used  by  KS  to  analyze  series  of  events 
which  occurred  over  a period  of  time.  More 
generally,  it  serves  as  a data  base  from  which 
reasoning  traces  are  generated  and  "how"  and  "why" 
questions  answered.  (See  reference  [1,8]  for  some 
examples  of  this  type  of  traces.) 

5.2.1  Hypothesis  Formation  Level 

At  the  lowest  xeve±  of  control  --  the  most  data 
specific  level  --  are  the  inference-generating  KSs,  or  the 
specialist -KSs.  Each  speciaiist-KS  has  the  task  of 
creating  or  modifying  hypothesis  elements,  evaluating 
Inferences  generated  by  other  spec ia 1 i st -KSs , and 
cataloging  of  missing  evidence  which  are  essential  for  a KS 
to  generate  meaningful  inferences. 
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Each  specialist -KS  has  access  to  the  blackboard.  Its 
focus  of  attention  is  that  portion  of  the  blackboard 
containing  the  latest  change(s)  made  to  the  current 
hypothesis.  Although  a KS  has  access  to  the  entire 
hypothesis,  it  normally  "understands"  only  the  descriptors 
contained  in  two  levels,  its  input  levei  and  its  output 
level. 


INFERENCE-GENERATION.  In f er enc e -gene r a 1 1 on  is  the 
creation  or  modification  of  hypothesis  elements;  it  is  the 
"hypothesize"  part  of  the  hypothesize-and-test  paradigm. 
An  inference-generator  may  use  a data-driven  or 
model-driven  hypothesis  formation  method.  As  mentioned 
earlier,  a KS  is  represented  as  a set  of  production  rules 
consisting  of  "situation-action"  pairs  . The  "situation" 
for  the  inference-generator  is  a particular  state  of  those 
hypothesis  elements  containing  data  relevant  to  the  KS.  A 
match  between  a description  in  the  hypothesis  element  and 
the  situation-side  of  a rule  indicates  that  a KS  can  make 
some  conjectures  regarding  that  hypothesis  element.  When 
the  appropriate  KS  is  invoked,  the  "action"  part  will 
transform  the  current  hypothesis  to  a new  current 
hypothesis  either  by  adding  new  links  to  the  structure, 
creating  new  hypothesis  elements,  or  changing  the  attribute 
values  of  a hypothesis  element  (see  Table  1.  for  a 
s umma  r y ) . 

INFERENCE-EVALUATION.  Inference  evaluation  Involves 
the  appraisal  of  inferences  generated  by  other  KSs ; it  is 
the  "test"  part  of  the  hypothesize-and-test  paradigm.  For 
each  inference  level  there  are  usually  more  than  one 
specialist -KS  capable  of  generating  inferences  on  that 
level.  When  a KS  is  invoked  because  of  a particular  event, 
another  KS  may  already  have  processed  the  salient  event. 
In  such  a circumstance,  the  currently  active  KS  evaluates 
the  inference  generated  by  the  other  KS.  The  evaluation  can 
result  in  the  KS  agreeing  with,  disagreeing  with,  or  being 
indifferent  about  the  particular  inference  being  evaluated. 
If  there  is  agreement,  the  confidence  in  that  inference  is 
increased;  if  there  is  disagreement,  either  the  confidence 
value  is  decreased  or  an  alternative  hypothesis  is 
generated.  There  is  no  action  taken  for  "I  don't  know" 
situations. 

PROBLEM-CATALOGING.  Problem  cataloging  involves 
attempting  to  identify  missing  evidence  essential  for  a KS 
to  generate  meaningful  Inferences.  If  a KS  is  unable  to 
make  new  inferences  when  called  upon  to  do  so,  it  may  be 
due  to  lack  of  knowledge  about  the  particular  situation  or 
due  to  lack  of  necessary  information,  that  is,  the  current 
situation  does  not  meet  the  conditions  on  the  situation 
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sides  of  the  ruies.  If  the  spec i ai i s t -KS  is  "ignorant" 
then  its  know! edge-base  need  to  be  augmented  in  some  way. 
If  the  cause  is  due  to  lack  of  particular  evidence,  a KS 
can  request  It  by  placing  notice  on  the  Probiems-i 1st , 
This  calls  the  system's  attention  to  a particular  situation 
in  which  a solution  is  possible  "...if  x were  true."  Since 
a specialist -KS  is  not  aware  of  the  Importance  (or  the 
unimportance)  of  its  own  immediate  needs  within  the  general 
framework  of  the  solution,  the  decision  to  pursue  or  not  to 
pursue  the  needs  of  the  specialist -KSs  is  made  by  a higher 
level  KS. 

5.2.2  KS-Act ivat ion  Level 

At  the  level  immediately  above  the 
hypothesis-formation  level  are  the  KS-act ivators  whose 
tasks  are  to  invoke  the  specialist -KSs  as  appropriate.  The 
KSs  on  this  level  represent  various  policies  and 
problem-soiving  strategies  related  to  the  utilization  of 
the  spec ia 1 i st -KSs . If,  for  example,  events  are  processed 
on  an  earl iest -occurences-f irst  policy,  we  would  have  a 
b r ead t h - f i r St  strategy;  if  events  are  processed  on  a 
iatest-occurences-first  policy,  we  would  have  a depth-first 
strategy. 

If  there  is  more  than  one  specialist -KS  available  to 
process  an  event,  some  policy  is  needed  to  guide  the  order 
in  which  these  KSs  are  to  be  utilized.  Different 
KS-activators  can  be  made  to  reflect  different  policies, 
ranging  from  fast est -f i rst  to  most -accurat e-f irst  (footnote 
4).  There  are  currently  two  kinds  of  KS  on  the 
KS-actlvat ion  level,  the  Event-driver  and  the 
Expectation-driver.  For  each  event  the  Event-driver 
activates  speciaiist-KSs  based  on  the  degree  of 
specialization  (and  assumed  accuracy)  of  the  KSs.  The 
Expect  at  ion -d r 1 ve r processes  items  on  the  Probiems-i ist  on 
the  basis  of  how  critical  the  needed  evidence  is  to  the 
emerging  hypothesis.  This  evaluation  of  how-critical  for 
the  continuous-signal  problem  is  sharply  defined  as  part  of 
the  knowledge  of  the  domain.  In  the  pro t e i n -mode  1 ing 
problem,  however,  the  evaluation  criteria  are  much  more 
heuristic,  and  in  fact  are  Just  another  element  of  the 
overall  analysis  strategy. 

The  Event -driver . An  event  type  represents  an  a priori 
grouping  of  similar  changes  to  the  hypothesis,  that  is,  it 
represents  the  abstractions  of  possible  changes  to  the 
hypothesis.  The  changes,  together  with  the  identity  of  the 
ruies  which  produced  the  changes,  are  put  on  a globally 
accessible  list  called  the  " Event -l i st " . The  Event-driver 
invokes  the  appropriate  Spec ia 1 i st -KSs  based  on  the 
information  contained  in  the  event  or  group  of  events. 


Expectation-driver.  The  task  of  the 

Expectation-driver  is  to  monitor  the  items  on  the 

Probiems-1 ist  to  see  if  any  events  which  might  satisfy  the 
conditions  on  the  Problems-iist  have  occurred.  If  the 
conditions  have  occurred,  it  will  activate  the 
specialist -KS  which  had  arranged  the  request,  (footnote  5) 

5.2.3  Strategy  Level 

The  set  of  rules  at  the  Strategy-ievei  captures  a 
human  expert's  knowledge  of  how  to  solve  a problem.  The 
task  of  the  Strategy-KS  --  the  highest  control  level  --  is 
to  choose  the  best  problem-solving  strategy  for  the  current 
state  of  the  solution.  Its  expertise  lies,  first,  in 
determining  how  close  the  current  hypothesis  is  to  the 
actual  solution.  In  neither  SU/X  nor  SU/P  are  there  formal 
mechanisms  to  measure  the  differences  between  the  current 
best  hypothesis  and  the  "right  answer".  The  program 
detects  when  the  solution  hypothesis  is  "on  the  right 
track"  by  use  of  heuristic  criteria.  For  example,  in  the 

protein  modeling  problem  a large  number  of  connected  nodes 
on  the  stereo-substructure  level  may  imply  that  the 
hypothesis  is  approaching  a solution. 

A consistent  inability  to  verify  expectation-based 
hypothesis  elements  may  signal  an  error  in  the  hypothesis. 
A more  general  Indication  of  ineffective  hypothesis 
formation  appears  as  a consistent  generation  of  conjectures 
whose  confidence  values  are  below  a threshhoid  value;  and 
which  therefore  indicates  that  the  analysis  is  "bogged 
down" . 

A strategy-KS  must  also  decide  or  a course  of  action 
once  a difference  between  the  hypothesis  and  the  "right 
answer"  is  found.  Note  that  these  two  functions  of  the 
Strategy-KS  --  noticing  weak  parts  of  the  hypothesized 
solution  and  choosing  the  appropriate  corrective  actions  -- 
correspond  to  the  situation  and  the  action  parts  of 
production  rules.  Currently,  the  Strategy-KS  can  take  one 
of  three  possible  actions: 

1.  invoke  the  Expectation-driver  to  see  if  the  local 
needs/goals  are  satisfiabie  by  recent  event(s); 

2.  invoke  the  Event-driver  to  process  the  latest 
changes  in  the  hypothesis; 

3.  decide  what  region  of  the  data  space  to  work  on 
next,  i.e.  determine  the  region  of  minimax 
ambiguity  in  the  data. 
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GOAL-DIRECTED  ACTIVITY: 


SOME  SPECULATIONS 


Our  experience  indicates  that  although  the  data-driven 
and  model -driven  hypothesis  formation  methods  in 
combination  are  powerful,  some  situations  are  best  handled 
with  a goai-driven  method,  i.e.  utiiizing  a goal  structure 
and  goai-seeking  search  processes.  In  the  programs 
described,  the  occasional  lack  of  certain  evidence  can  haxt 
the  whoie  probiem-soxving  process.  However,  the  need  for 
missing  evidence  may  already  be  known  and  catalogued  on  the 
P r ob 1 em s - 1 i s t . Under  such  a circumstance  the  obvious 
solution  is  to  set  a goai  for  "seeking"  that  evidence. 
Within  the  context  of  the  current  implementation,  a 
goa 1 -d i r e c t e d search  through  the  soiution  space  can  be 
accompiished  by:  (1)  adding  a Goal-driver  on  the 
KS-act i vat  ion  controi  ievei,  (2)  implementing  a 
ba ckwa rd -c ha i n i ng  mechanism  for  the  ruies  as  in  the  MYCIN 
system  [1],  and  (3)  adding  ruies  to  the  Strategy-KS  to 
choose  between  data-driven,  modei-driven  and  goai-driven 
methods  of  hypothesis  formation  as  appropriate. 

7 SUMMARY  AND  CONCLUDING  REMARKS 

SU/X  and  SU/P  are  two  application  programs  that  have 
been  written  to  reason  toward  an  understanding  of  digitized 
physical  signals.  The  essential  features  of  the  programs' 
design  are:  (1)  data-  and  modei-driven,  opportunistic  modes 
of  hypothesis  formation  in  which  the  "control"  is  organized 
hierarchically,  and  (2)  a globally  accessible  hypothesis 
structure  augmented  by  focus-of-at t ent ion  and  historical 
information  which  serve  to  integrate  diverse  sources  of 
knowledge.  The  basic  design  is  similar  in  many  ways  to  the 
HEARSAY-II  Speech  Understanding  System  design.  It  is 
applicable  to  many  different  types  of  problems,  especially 
to  those  problems  that  do  not  have  computationally  feasible 
"legal  move  generators"  and  must  therefore  resort  to 
opportunistic  generation  of  alternate  hypotheses. 

The  use  of  production  ruies  to  represent 
cont roi /st rat egy  knowledge  offers  the  advantages  of 
uniformity  of  representation  and  accessibility  of  knowledge 
for  purposes  of  augmentation  and  modification  of  the 
knowledge  base.  Because  the  iine-of-reasoning  is  often  a 
complex  compounding  of  the  elemental  steps  indicated  by  the 
rules,  a dynamic  explanation  capability  is  needed.  We  did 
not  discuss  this  important  feature  of  the  programs.  Nor 
did  we  discuss  the  facility  which  aiiows  assignment  of  an 
expert's  degree  of  uncertainty  for  each  rule  entered.  The 
use  of  this  facility  is  not  well  developed  currently  in  the 
programs  discussed.  (See  References  8 and  9 for  similar 
but  better  developed  capabilities  in  the  MYCIN  program.) 
We  believe  that  facilities  for  explanation  and  for  inexact 
inference  must  be  integrated  into  the  program  design  at  the 
initial  stages. 
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Footnotes 


1.  SU/X  was  implemented  in  the  context  of  a military 
signal -underst anding  application.  It  is  a large 
INTERLISP  program  that  performed  well  on  a variety  of 
complex  3 ignal -i n t er pe t a t ion  tasks  within  the  domain. 
SU/P,  also  written  in  INTERLISP,  is  under  development. 

2.  The  events  are  stored  in  three  lists,  each  of  which 
requires  its  own  special  treatment;  knowledge-based 
events  i.e.  events  specifically  related  to  changes 
in  the  hypothesis;  time-based  events  , i.e.,  those 
events  specifically  related  to  expectations  of  "what 
will  happen  when";  and  problems  , i.e.  expectations 
from  the  programs'  "model  of  the  situation"  for  which 
the  clinching  confirmatory  or  disconf Irmat ory  evidence 
has  not  yet  been  found. 

3.  As  mentioned  earlier,  the  design  of  the  hypothesis 
structure  in  SU/X  and  SU/P  is  based  on  the  concepts 
found  in  HEARSAY-II.  We  refer  you  to  [^,7]  for  a more 
detailed  description. 

The  issues  of  focus  of  attention  and  resource 
allocation  policies,  as  described  by  Hayes-Roth  and 
Lesser  [6],  are  important  ones.  A subsequent  paper 
will  describe  the  implementation  of  these  policies 
within  the  SU/X  and  SU/P  framework. 

5.  The  problems  which  are  "need-f or-evidence"  can  be 
viewed  as  " subgoai s- 1 o-be-achieved " . The  systems  are 
currently  biased  toward  an  opportunistic  mode  of 
hypothesis  formation,  and  the  implicit  strategy  for 
such  subgoals  is  "wait  and  see". 
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* The  nodes  represent  hypothesis  elements. 

The  arrows  represent  KSs  which  infer  hypothesis  element(s)  on  one  level 
from  hypothesis  elements  on  another  level. 


I 
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i Topo log  ica I 
I know  ledge 


0.  Skeletonization 

1.  Helix  Identification  (Skeietal) 

2.  Sidechain  Identification 

3.  Bond  Rotation 

k.  Sequence  Identification 

5.  Helix  Identification  (Topological) 

6.  Cofactor  Identification 

7.  "Heavy  Atoms"  Identification 


Knowledge  Source  Utilization  in  Hypothesis  Formation 


Figure  2. 


-D  Model 


Hypothesis.  Construction  in  the  Protein  Modeling  Problem 


Relationship  between  Hypothesis  Hierarchy  and 
Control  Hierarchy 

Figure  4. 
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Spec i a 1 i s t- KS  (on  Hy pothes i s - format i on  Level) 


Event-  and  Expectat 1 on- Dr i vers  (on  Knowledge-Source-Activation  Level) 


Has 

access  to: 

1 

events  on  the  Eventlist, 

2 

items  on  the  Prob lems- 1 i st , and 

3 

time-based  events. 

May 

act  to:  invoke  appropriate  Spec i a 1 i st-KSs  in  an 

appropriate  sequence 

to  reflect  its  resource  allocation  poll 

cy. 

Strategy-KS  (on  Strategy  Level) 


Has  access  to: 

1 . Event  1 i St , 

2 . Problems- list, 

3.  time-based  events, 

A.  Current-Best-Hypothesis  (or  a summary  of  CBH  if  available),  and 
5.  Event-  and  Expectation-Drivers. 

Hay  act  to: 

1.  choose  the  appropriate  KSs  on  the  KS-Act i vat i on  level,  and/or 

2.  change  the  focus  of  attention  (i.e.  choose  and  event,  a problem, 
a dormant  region  of  the  hypothesis,  or  a different  region  of  the 
data  to  process  next) . 


Has  access  to: 

1 . primary  data , 

2.  hypothesis  elements, 

3.  facts,  and 

A.  events  in  the  Event  history  list. 


Hay  act  to: 

1.  change  the  values  of  attributes  of  hypothesis  elements  or 

2.  change  the  links  (relationships)  in  the  hypothesis  structure,  and 

3.  inform  the  system  of  its  actions  by: 

a.  putting  on  the  Eventlist  the  type  of  changes  that  were  made,  or 

b.  putting  unresolved  prolblems  on  the  Problems- 1 i st , or 

c.  ask  to  be  recalled  at  a later  time  (generate  time-based  event). 


Summary  of  KS  Activities  on  Different  Control  Levels 


Table  1. 
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